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A  PARALLEL  DECOMPOSITION  ALGORITHM 
FOR  STAIRCASE  LINEAR  PROGRAMS* 


Robert  Entriken 

Department  of  Operations  Research 
Stanford  University 


Abstract 


As  part  of  an  extended  research  project  on  the  parallel  decomposition  of  linear  programs,  a 
parallel  algorithm  for  Staircase  Linear  Programs  was  designed  and  implemented.  This  class  of 
problems  encompasses  a  large  range  of  planning  problems  and  when  decomposed  has  simple 
subproblem  formulations  and  communication  patterns.  This  makes  its  solution  a  manageable 
step  toward  our  eventual  goal  of  producing  a  general  code  that  automatically  exploits  problem 
structures  of  various  forms.  ^ 

‘  The  results  presented  here  were  derived  from  an  implementation  for  a  Sequent  Balance 
8000  shared-memory  multiprocessor.  The  algorithm  itself  is  message-based  but  can  run  on 
either  shared-  or  distributed-memory  parallel  computers. 


-A  simple  diet  planning  problem  is  used  to  demonstrate  the  principles  of  the  algorithm’s 
development  and  performance.  When  applied  to  this  problem,  the  parallel  decomposition  algo¬ 
rithm  shows  promise  relative  to  present  serial  optimization  codes.  The  nonlinear  optimization 
code  MINOS  5.1  is  used  both  as  a  basis  for  comparison  and  as  a  generic  subproblem  solver.  The 
greatest  room  for  speedup  is  in  exploiting  problem  structures.  The  results  show  that  decompo¬ 
sition  can  improve  efficiency  even  with  a  single  processor.  Examples  are  given  where  multiple 


processors  lead  to  still  greater  efficiency.  / 
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1.  INTRODUCTION 

The  term  Staircase  Linear  Program  (SLP)  describes  a  Linear  Program  (LP)  that  has  a  staircase 
pattern  in  the  nonzero  coefficients  of  its  constraint  matrix,  as  illustrated  in  Figure  1.1.  Each  “step” 
in  the  staircase  typically  corresponds  to  a  collection  of  variables  for  a  “period”  of  a  planning  horizon. 


This  research  was  supported  by  the  Electric  Power  Research  Institute,  tl»<_  U  S  A”*  Force  Office  of  Scientific  Research, 
the  U  S  Department  of  Energy,  the  National  Security  Agency,  the  National  Science  Foundation,  the  Science  Alliance 
Program  of  the  State  of  Tennessee,  and  the  Department  of  Operations  Research  at  Stanford  University. 
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Figure  1.1.  The  classic  staircase  pattern. 

The  collections  of  variables  are  described  as  periods  because  the  staircase  structure  arises  most 
often  from  problems  that  represent  systems  over  time  [HL81],  In  this  report,  we  study  multi-period 
or  multi-day  diet  plans  as  examples  of  SLPs.  The  diet  planning  problem  is  a  very  simple  LP  that 
will  help  describe  the  reformulation  and  solution  stages  involved  in  solving  staircase  systems  with 
a  parallel  computer. 

To  make  proper  use  of  a  parallel  computer,  we  must  reformulate  the  original  problem  into 
mult  iple  subprobltms  and  then  submit  them  to  multiple  processors  as  a  means  of  obtaining  greater 
throughput.  The  subproblems  pass  messages  among  themselves  in  a  serial  communication  network 
of  the  form  shown  in  Figure  1.2,  where  the  circles  represent  the  subproblems  and  the  lines  between 
them  the  communication  paths.  Each  period’s  variables  are  associated  only  with  the  previous  and 
following  periods’  variables  in  Figure  1.1.  The  serial  structure  results  directly  from  the  pattern  of 
interdependencies  between  variables  in  the  SLP.  The  medium  of  communication  between  processors 
in  the  parallel  computer  should  have  the  ability  to  mimic  a  serial  network. 


Figure  1.2.  A  serial  communication  network. 

Abrahamson  [Abr83]  and  Wittrock  [Wit83]  developed  the  topic  of  nested  dual  decomposition. 
The  same  material  is  repeated  here  for  completeness,  but  in  much  less  detail.  Their  work  focused 
on  the  solution  of  such  problems  with  a  serial  computer.  We  will  consider  here  an  extension  to  the 
use  of  a  parallel  computer,  and  paraphrase  their  results  to  prove  the  parallel  algorithm  converges. 

Three  major  factors  have  been  identified  that  significantly  affect  the  speed  and  efficiency  with 
which  a  solution  is  obtained  in  this  framework: 

(1)  the  number  of  subproblems  into  which  we  divide  the  SLP, 

(2)  the  number  of  processors  used  to  solve  the  subproblems,  and 

(3)  the  order  in  which  the  subproblems  are  solved. 

A  given  subproblem  may  itself  be  a  lower-dimensional  SLP  containing  any  number  of  adjacent 
steps  c!  *!ie  original  staircase. 
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It  will  be  shown  that  there  are  diminishing  returns  associated  with  extensive  decomposition  of 
SLPs,  and  in  the  same  way  with  increasing  the  number  of  processors  used.  The  argument  against 
both  further  decomposition  and  the  use  of  more  processors  is  the  increasing  cost  of  communica¬ 
tion.  It  should  be  noted,  however,  that  communication  costs  will  diminish  because  of  technology 
breakthroughs.  Hence,  this  effect  should  be  less  significant  in  the  future.  Finally,  with  a  better 
understanding  of  the  dynamic  and  unpredictable  path  that  the  following  parallel  optimization  al¬ 
gorithm  takes  to  a  solution,  we  will  be  better  able  to  appreciate  the  subtle  eftect  that  the  solution 
order  of  the  subproblems  has  on  overall  performance. 

2.  THE  FORMULATION  OF  SLPS 
2.1.  A  One-Day  Diet  Problem 

The  One-Day  Diet  Problem  is  an  example  taken  from  Chvatal  [Chv83].  Its  mathematical  formula¬ 
tion  can  be  found  below  as  DIET1,  with  specific  examples  for  the  problem  data  A ,  b,  c,  and  u.  The 
problem  is  to  find  the  optimal  selection  of  six  commodities*  x,  based  on  their  corresponding  costs 
(given  by  c),  and  their  relative  contributions  toward  satisfying  the  minimum  daily  requirements 
b,  for  CALCIUM,  PROTEIN,  and  ENERGY.  The  number  of  requirements  is  limited  to  three  for 
simplicity’s  sake,  while  the  amount  of  each  commodity  selected  to  satisfy  them  is  bounded  above 
by  satiation  points  u.  The  boxes  in  Figure  2.1  represent  the  pattern  of  nonzero  coefficients  in  the 
costs  and  constraints.  The  problem  is  dense. 

£  800 
2!  55 
>2000 

Figure  2.1.  Structure  of  the  one-day  diet  problem. 

DIET1  is  a  linear  program  for  determining  a  single  day’s  purchases  while  spending  the  least 
amount  of  money.  It  will  be  used  as  a  base  case  for  formulating  and  studying  multi-day  diet¬ 
planning  SLPs  as  examples.  The  primal  and  dual  formulations  of  DIET1  are: 

(Primal)  minimize  cTx  subject  to  Ax  >  6,  0  <  x  <  u. 

(Dual)  maximize  bT it  —  uTtr  subject  to  ATir  —  a  <  c,  a  >  0,  x  >  0. 

The  following  notation  combines  the  dual  variables  x  and  a  with  the  primal  formulation: 

(DIET1)  minimize  cTx 
subject  to  x:  Ax  >  b 

<r:  0  <  x  <  «, 

*  OATMEAL,  CHICKEN,  EGGS,  MILK,  PIE,  and  PORK  k  BEANS. 
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The  optimal  selection  of  commodities  is  x  =  (  4  0  0  4.5  2  0)  ,  with  the  corresponding 
dual  solution  if  =  ( 0  0  0.05625  )T  and  a  —  (3.1875  0  0  0  3.625  0)T.  Seven  iterations 
of  t  lie  simplex  method  [Dan63]  are  required  to  obtain  this  solution  using  MINOS  5.1  [MS87]  with 
the  default  parameter  settings,  resulting  in  a  minimum  cost  of  cT x  =  92.5.  From  the  solution 
to  the  dual,  one  might  determine  that  the  ENERGY  constraint  is  the  most  difficult  to  satisfy 
given  the  available  commodities.  Because  there  are  zero  prices  on  the  CALCIUM  and  PROTEIN 
constraints,  one  can  quickly  determine  that  the  commodities  chosen  are  relatively  low  in  ENERGY 
and  high  in  CALCIUM  and  PROTEIN. 


2.2.  A  Two-Day  Diet  Problem 

DIET2  is  a  linear  program  that  plans  a  selection  of  commodities  over  two  days.  It  is  similar  to 
repeating  D1ET1  twice,  but  the  daily  requirement  of  ENERGY  is  relaxed  to  be  satisfied  in  any 
combination  over  both  days  instead  of  each  individually.  The  individual  ENERGY  constraints 
were  added  together,  doubling  the  value  of  the  right-hand  side  (RHS)  entry. 


DAY1  DAY2 


minimize 


COST 


CALCIUM 


£800 

>55 

£4000 

£800 

£55 


Figure  2.2.  Structure  of  the  two-day  diet  problem. 

Figure  2.2  shows  the  2-step  sparse  staircase  coefficient  pattern  of  DIET2.  If  the  two  individual 
DIETl-type  ENERGY  constraints  had  not  been  added  together  when  forming  DIET2,  the  optimal 
solution  x  of  DIETl,  repeated  twice,  would  be  the  unique  optimal  solution  to  such  a  problem. 
However,  because  we  combined  the  individual  ENERGY  constraints  into  a  single  constraint,  this 
optimal  solution  to  DIET2  is  not  unique,  nor  is  it  basic*. 


*  See  Appendix  A  for  a  proof. 


-  4  - 


(DIET2)  minimize  Cj'xi  +  CjJ^ 
subject  to  k i  :  A\Xi  >  61 

*2  :  B1X1+A2X2  >  62 


where 
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cj  =  cj  =  (3  24  13  9  20  19),  Ul  =  u2  =  (4  3  2  8  2  2)T. 

The  set  of  optimal  solutions  to  DIET2  contains  the  optimal  solution  to  D1ET1  repeated  twice. 
The  locus  of  DIET2’s  optimal  solutions  is: 

(xfiJ)=A(4  0  0  5.125  2  0  4  0  0  3875  2  0)  + 

(1  —  A)  ( 4  0  0  3.875  2  0  4  0  0  5.125  2  0),  Ae[0,l], 

(ir'[xj)=(0  0  0.05625  0  0), 
af  =  &J  =  (3.1875  0  0  0  3.625  0). 

The  added  freedom  in  choosing  the  two-day  selection  of  goods  allows  selections  of  each  day 
to  be  mutually  dependent.  Pair-wise  dependence  between  repeating  collections  of  variables  is  the 
characteristic  of  SLPs  that  gives  them  their  serial  communication  structure.  DIET2  is  our  example 
two-period  SLP. 

2.3,  A  Three-Day  Diet  Problem 

Our  example  three-period  SLP  (DIET3)  determines  the  optimal  selection  of  three  days’  commodi¬ 
ties.  In  this  example  the  general  staircase  pattern  begins  to  emerge  from  the  nonzero  coefficients 
of  the  constraints  as  shown  in  Figure  2.3.  There  are  the  first  and  last  periods  (DAY1  and  DAY3) 
with  only  one  adjacent  period  or  collection  of  variables,  and  there  is  the  middle  period  (DAY2) 
whose  neighbors  precede  and  follow  it.  For  an  n-day  problem,  there  will  be  n  —  2  such  “middle” 
periods,  each  of  which  has  two  neighbors. 
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Figure  2.3.  Structure  of  the  three-day  diet  problem. 

In  DIET3  the  ENERGY  requirement  is  shared  between  the  first  two  days  as  in  DIET2.  In 
addition,  the  PROTEIN  requirement  is  shared  between  the  last  two  days.  This  pattern  can  be 
propagated  by  extending  the  last  constraint  over  two  days  and  listing  the  remaining  two  individ¬ 
ually  below  it.  The  third  and  fourth  day  share  CALCIUM  and  after  that  the  cyclical  pattern 
repeats.  ENERGY,  PROTEIN,  CALCIUM. 

(DIET3)  minimize  cfxi+c£x2-t-C3  x3 
subject  to  *1  :  AiXi  > 

*2  •  R1X1  +  A2X2  >  62 

Jf3  :  B2X2+A3Z3  >  63 

<7 1  :  0<Xi<Uj,  <r2  :  0  <  x2  <  u2,  <73:  0  <  x3  <  ti3, 
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cf=(  3  24  13  9  20  19),  t*  =  ( 4  3  2  8  2  2)T,  *  =  1,2,3. 

The  solution  to  DIET3  is  similar  to  that  of  DIET2  in  that  the  optimal  single-day  selection 
from  DIET1,  repeated  three  times,  is  optimal  overall.  In  addition,  as  before,  this  solution  is  one 
of  a  class  of  optimal  solutions  to  DIET3: 

(xfxJ)  =  A(4  0  0  5.125  2  0  4  0  0  3.875  2  0)  + 

(1  _  A) ( 4  0  0  3.875  2  0  4  0  0  5.125  2  0),  A  €  [0, 1], 

xj  =  (4  0  0  4.5  2  0), 

(jrj  rrj  tJ)  =  (0  0  0.05625  0  0  0.05625  0), 

dj  =  eT  =  ffl  =  (3.1875,0  0  0  3.625  0). 

There  is  only  one  degree  of  freedom  in  the  solution  because  the  PROTEIN  constraints  are 
nonbinding. 

3.  REFORMULATING  SLPS  INTO  MULTIPLE  SUBPROBLEMS 

We  will  now  focus  on  reformulating  the  original  general  SLP  into  many  interrelated  subproblems 
using  a  technique  called  Benders  decomposition  [Ben62].  The  purpose  of  creating  a  collection  of 
subproblems  in  place  of  a  single  problem  is  to  solve  the  collection  simultaneously  with  a  parallel 
computer.  It  will  suffice  for  the  scope  of  this  report  to  present  the  subproblem  formulations 
directly,  and  then  go  on  to  the  parallel  algorithm  for  solving  the  SLP.  Each  subpnblem  formulation 
contains  independent  portions  of  the  original  problem  data,  additional  necessary  conditions  (cuts), 
and  accounting  variables  that  are  simple  machinery  for  algorithmic  support — most  calculations  are 
implicit  in  the  formulations,  not  explicit  in  the  algorithm. 

The  following  discussion  will  focus  mainly  on  the  case  when  the  subproblems  are  solved  to 
optimality,  placing  less  emphasis  on  on  cases  when  their  solutions  are  infeasible  or  unbounded. 
This  facilitates  the  exposition  of  the  algorithm,  saving  the  more  complex  cases  for  the  next  section 
when  our  insight  is  sufficiently  developed. 

3.1.  Two-Period  SLP 

Benders  decomposition  differs  from  the  more  familiar  Dantzig- Wolfe  decomposition  in  that  the 
former  partitions  an  LP  according  to  its  variables  whereas  the  latter  partitions  it  according  to  its 
constraints.  Each  subproblem  fixes  the  values  of  certain  primal  and  dual  variables  of  the  original 
SLP  in  order  to  solve  a  reduced  problem  over  a  smaller  set  of  variables.  DIET2  can  easily  be 
decomposed  into  the  two  interdependent  subproblems  DIET2.1  and  DIET2.2,  that  are  solved  one 
after  the  other,  with  appropriate  modifications  to  certain  algorithm  parameters  (II2,  62,  P2,  V\  and 
),  until  the  modifications  no  longer  affect  the  solutions  of  the  two  subproblems.  These  parameters 
are  explained  in  detail  following  the  formulations.  In  general,  a  subscript  refers  to  a  subproblcm 
number,  and  a  superscript  k  refers  to  a  variable’s  value  in  the  fcth  solution  to  the  subproblem.  We 
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will  ust  the  notation  x  for  intermediate  subproblem  solutions,  with  x  to  denote  the  final  solution 
for  the  full  SLP.  All  other  variables,  primal  and  dual,  will  follow  the  same  notation. 


Figure  3.1.  Information  flow  in  the  two-period  SLP. 

DIET2.1  initially  drops  the  ENERGY  constraint  and  the  second  day’s  PROTEIN  and  CAL¬ 
CIUM  constraints  from  the  original  SLP,  and  solves  only  with  regard  to  the  first  day’s  CALCIUM 
and  PROTEIN  requirements  and  minimum  COSTS.  Naturally,  a  certain  amount  of  ENERGY  is 
thereby  offered  by  DIET2.1.  It  is  calculated  as  y\  =  B\X\  in  the  rows  corresponding  to  the  dual 
variables  pi . 

(DIET2.1)  minimize  cf  X\  +  6\  —  z\ 
subject  tc  it i  :  AiXi  >  bi 

Pi  :  Bill  -  lyi  =0 

pi  :  n2J/i-f<5201  >  P2 

cr\  :  0<x,  <  Mi. 

(DIET2.2)  minimize  cjx2  =  zi 

subject  to  r\i  :  6 =  6* 

*2  :  A2x2-fj^iu2  >  &2 

<r2  :  0  <  x2  <  «2,  tu2  >  0,  rft  =  0  if  =  0. 

Taking  (rf{ , z{ , x{ , y{ ,0{)  as  the  first  solution  to  DIET2.I  with  its  parameters  n2,62,  and  p2 
set  to  zero,  we  note  that  6}  and  y{  will  be  used  in  DIET2.2’s  formulation.  We  set  =0  if  DIET2.1 
finishes  unbounded,  6}  =  1  if  it  finishes  optimal,  and  6}  =  undefined  if  it  finishes  infeasible,  because 
the  entire  SLP  must  be  infeasible. 

Let  us  assume  that  DIET2  1  finishes  optimal.  We  set  =  1  so  that  DIET2.2  adopts  y*  as 
the  amount  of  ENERGY  offered  by  DIET2.1.  (Note  that  yj  is  effectively  subtracted  from  the 
right-hand  side  when  6}  =  1  because  tu2  is  fixed  at  1.)  After  solving  DIET2.2  and  assuming 
optimality,  we  return  to  DIET2.I  with  the  optimal  prices  on  DIET2.2’s  constraints  corresponding 
to  t2.  These  are  used  to  impose  a  new  constraint  on  yi  to  ensure  that  the  same  dual  feasible 
extreme  point  will  not  be  obtained  again  when  DIET2.2  is  solved,  unless  the  optimal  value  of 
yi  has  been  reached.  This  is  the  verbal  interpretation  of  the  p\  rows  of  the  first-day  subproblem. 
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The  inequality  ILs/i  +^1  >  P2  in  DIET2.1  is  a  collection  of  added  constraints  ( 7rf  l7" j/i  + 
6*0 1  >  p*  obtained  from  K  dual  solutions  to  the  second-day  subproblem  (Jf.zf.ijJ,**,  <r  = 
1 , 2, ....  A').  In  particular,  62  =  (6\ ,  ■  ■  • ,  6*  )T  is  a  vector  of  Kronecker  delta  functions  indicating 
optimality  in  each  of  the  K  solutions.  If  the  kth  dual  solution  to  DIET2.2  is  dual  feasible  and 
bounded  above,  then  we  set  6*  =  1;  if  it  is  dual  feasible  and  unbounded  above  then  6*  =  0;  if  dual 
infeasible  (indicated  by  an  unbounded  primal  solution),  the  two-period  primal  SLP  is  unbounded. 
The  vector  P2  =  (p\,  ■  ,P^)T  is  a  corresponding  collection  of  scalars  calculated  as  p*  =  :*  -  * 

from  z*  (the  objective  value,  or  sum  of  infeasibilities)  and  77*  (the  price,  or  multiplier,  on  the 
constraint  corresponding  to  772)-  Finally,  II2  =  (ij, . . . ,  *2  )T <  where  tt*  is  the  vector  of  prices  or 
multipliers  on  the  constraints  corresponding  to  *2- 

One  way  of  interpreting  the  addition  of  constraints  to  DIET2.1  is  that  the  matrix  A2  is  being 
approximated  by  the  independent  rows  in  112-  It  is  therefore  sufficient  to  carry  along  at  most  the 
number  of  constraints  corresponding  to  the  row-rank  of  ^2-  In  other  words,  when  such  constraints 
become  slack,  they  may  be  discarded.  However,  if  they  turn  out  to  be  binding  in  the  final  optimal 
solution,  they  will  be  regenerated.  Discarding  cuts  runs  the  risk  of  cycling  due  to  degeneracy;  it 
could  lead  to  a  repeated  pattern  of  discarding  and  regenerating  the  same  constraints, 

3.2.  n-Period  SLP 

The  subproblems  of  the  n-period  SLP  are  of  three  types:  the  first  period  as  in  DIET2.1,  the  last 
period  as  in  DIET2.2,  and  those  with  two  adjacent  subproblems,  which  are  a  combination  of  the 
two  other  forms.  The  three  subproblems  of  DIET3  (D1ET3.1,  DIET3.2,  D1ET3.3)  exemplify  these 
three  types  and  thereby  those  of  an  n-period  SLP. 


Figure  3.2.  Information  flow  in  the  n-period  SLP. 

Following  the  manner  in  which  DIET2  was  decomposed  into  two  subproblems,  we  will  quickly 
go  through  the  division  of  the  three-day  diet  problem  into  three  subproblems.  This  exercise  will 
d'-rnonstrate  two  key  procedures.  The  first  is  the  formulation  of  a  subproblem  that  accepts  solutions 
from  two  others,  the  previous  and  following  subproblems,  as  opposed  to  only  one  other  in  the 
DIET'2  cases;  this  allows  a  generalization  to  the  n-period  SLP.  The  second  procedure  is  associated 
with  the  possibility  of  using  more  than  one  processor,  thereby  carrying  Benders  decomposition  into 

-  9  - 


tin1  multiprocessor  environment.  As  described,  a  subproblem  may  have  more  th-'!’  one  neighbor. 
Hence,  an  algorithmic  choice  must  be  made  as  to  which  neighbor  to  solve  next  when  there  is  only 
one  processor.  When  multiple  processors  are  available,  a  choice  need  not  be  made — all  of  the 
neighboring  subproblems  may  be  solved  simultaneously. 

The  three  subproblems  of  DIET3  are  formed  by  partitioning  the  daily  selections  as  before. 
DIET3.1  will  plan  purchases  for  the  first  day,  DIET3.2  for  the  second  day,  and  DIET3.3  for  the 
third  day.  As  before,  jh  is  the  amount  of  ENERGY  in  the  first  day’s  selection,  while  y2  is  the 
amount  of  PROTEIN  in  the  second  day’s  selection. 

(DIET3.1)  minimize  cfx  i  +  0\  —  zj 

subject  to  xi  :  Aixi  >  b\ 

pi  :  Bill  -  Zyi  =0 

Hi  :  T\2yi  +  6261  >  p2 

(7 1  ;  0<Xl<  Uj. 

(DIET3.2)  minimize  c^x  2  +  02  =  z2 

subject  to  tj 3  :  102  = 

*2  :  A2x 2  +ViW2>b2 

P‘2  :  B2x2-ly2  =0 

H2  '■  n3y2  +  6362  >  p3 

<r2  :  0  <  x2  <  «2,  w2  >  0,  rj2  =  0  if  =  0. 

(D1ET3.3)  minimize  0^X3  =  z3 

subject  to  t)3  :  b2ws  =  l2 

*  3:  ^3*3  +  y2m3>63 

<r3  .  0  <  r3  <  u3,  u>3  >  0,  rfy  =  0  if  b2  =  0- 

Subproblem  Parameters 
D1ET3.1  fi  2,62,p2 

DIET3.2  <5i,yf,n3,J3,p3 

DIET3.3  6§,y£ 

Each  of  the  subproblem  formulations  contains  parameters  that  are  based  on  the  solutions  of 
neighboring  subproblems.  The  next  section  will  describe  the  continual  updating  of  these  parameters 
as  part  of  a  parallel  decomposition  algorithm. 
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4.  SOLVING  SUBPROBLEMS  ON  A  PARALLEL  COMPUTER 

From  the  subproblem  formulations  in  the  previous  section,  we  have  seen  that  any  given  subproblem 
contains  from  two  to  five  parameters.  These  are  initially  undefined,  and  are  first  given  values  when 
a  neighboring  solution  is  communicated.  In  the  serial  dual-decomposition  algorithm  we  begin  by 
solving  the  last-period  subproblem  (knowing  that  any  solution  must  meet  the  dual  constraints 
corresponding  to  xn ),  and  then  work  toward  obtaining  a  dual-feasible  solution  and  then  an  optimal 
solution  for  all  periods,  if  possible. 

In  the  parallel  algorithm  we  begin  by  solving  all  subproblems  simultaneously,  with  their  com¬ 
munication  parameters  initially  set  to  zero.  Their  solutions,  despite  not  including  neighboring 
information,  are  still  relevant  and  can  be  used  to  construct  modifications.  When  used  to  modify 
right-hand  sides,  they  will  direct  the  new  optimal  solutions  to  a  possibly  different  set  of  relevant 
solutions.  When  used  to  add  constraints,  the  type  of  constraint  depends  on  whether  the  neighbor¬ 
ing  subproblem  solution  was  infeasible  or  feasible  and,  if  feasible,  whether  bounded  or  unbounded. 
Initially  only  the  last  subproblem  solution  may  be  used  to  add  an  optimality  constraint,  yet  any 
infeasible  solution  may  be  used  to  generate  a  necessary  condition  for  feasibility  (a  feasibility  con¬ 
straint).  In  general,  an  optimality  constraint  may  be  passed  only  if  the  present  subproblem  already 
contains  such  a  constraint  (except  the  last  subproblem,  of  course). 

Just  as  a  two-subproblem  decomposition,  with  no  middle  subproblems,  is  a  special  case  of  an 
n-subproblem  decomposition,  the  single-processor  algorithm  is  a  special  case  of  the  multiprocessor 
parallel  algorithm  with  one  subproblem  being  solved  at  a  time  instead  of  many.  With  only  one 
processor,  the  parallel  algorithm  reduces  to  Benders  decomposition. 

The  parallel  computer  architecture  used  for  solving  SLPs  with  the  parallel  algorithm  is  as¬ 
sumed  to  have  numerous  independent  and  powerful  processors.  The  amount  of  memory  available 
locally  for  the  use  of  each  individual  processor  is  assumed  to  be  substantial  (>  |  Megabyte) 
since  the  optimization  code  (MINOS  5.1)  stored  for  each  processor  is  large,  and  each  subprob¬ 
lem  data-set  can  be  large.  Shared-  and  distributed-memory  multiprocessing  computers  as  well  as 
distributed-processing  computers  are  suitable  for  our  application.  Table  4.1  gives  some  examples 
of  commercially  available  processors. 


Type 

Name 

Number  of 
Processors 

Size  of 
Memory 

Shared  Memory 

Sequent 
Balance  8000 

8  NS32032 

8  Meg. 

Distributed  Memory 

NCUBE 

4  Specialized 

1/2  Meg  ea. 

Intel  iPSC 
hypercube 

64  80286 

1/2  Meg  ea. 

Distributed  Processor 

VAX  cluster 
on  Ethernet 

5  VAXstation  IIs 

8  Meg  ea. 

Table  4.1.  Examples  of  parallel  computer  architectures. 


In  a  discussion  of  alternative  architectures,  the  issues  of  computational  and  communication 
loads  are  of  primary  importance.  The  ideal  is  to  distribute  the  computational  load  equally  across 


all  processors  while  keeping  the  time  used  for  communication  to  a  minimum.  The  reformulation 
(initialization)  stage  requires  a  large  flow  of  information  between  the  processors,  but  in  the  solution 
stage  the  messages  are  typically  small  and  infrequent.  This  is  evident  because  reformulation 
involves  distributing  the  original  data  into  n  subproblems,  whereas  during  the  solution  stage  most 
time  is  spent  solving  LP  subproblems  with  the  simplex  method. 

The  description  of  this  algorithm  is  directed  primarily  toward  a  shared-memory  implementa¬ 
tion.  Parallel  processors  with  distributed  memory  require  an  additional  scheme  for  distributing 
the  work  load  so  that  the  processors  are  responsible  for  disjoint  subsets  of  the  subproblems.  If 
the  subset  of  a  processor  contains  more  than  one  subproblem,  they  should  be  handled  as  in  the 
single-processor  shared- memory  case  (a  special  case  of  the  parallel  algorithm  below).  In  addition, 
the  scheduling  of  subproblems  between  processors  becomes  an  implicit  result  of  message  passing. 

4.1.  Processes,  Jobs  and  Queues 

There  will  be  a  user-specilied  number  of  subproblems  n  and  processors  p  involved  in  solving  a 
general  SLP.  The  number  of  processors  could  exceed  the  number  of  subproblems  (p  >  n)  but  this 
would  leave  the  extra  processors  unused,  or  inefficiently  loaded.  Hence,  we  assume  that  p  <  n. 

Associated  with  each  of  the  n  subproblems  is  a  job  consisting  of  the  loop  of  tasks  in  Figure  4.1. 
The  term  “job”  is  used  to  emphasize  the  fact  that  it  encompasses  more  than  solving  linear  program 
subproblems.  The  tasks  of  each  job  are  repeated  in  succession  and  any  processor  can  execute  them. 
A  job  is. always  in  one  of  three  states:  run,  running;  pend,  waiting  to  be  run;  or  sleep,  solved 
and  waiting  for  new  information. 


Figure  4.1.  The  RUN  JOB  loop. 

The  p  processors  repeatedly  execute  the  loop  in  Figure  4.2,  which  transfers  jobs  among  three 
shared  queues  (run,  pend,  and  sleep)  according  to  the  result  of  running  through  the  job  loop. 
Each  queue  corresponds  to  one  of  the  three  job  states. 


Figure  4.2.  The  process  loop. 
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The  run  queue  contains  the  jobs  currently  running  on  p  or  fewer  processors,  the  results  of 
which  will  define  the  next  states  of  the  jobs.  Thus,  it  will  never  contain  more  than  p  jobs.  The 
pend  queue  is  where  a  processor  finds  jobs  (subproblems)  waiting  to  be  run  (solved),  and  the 
sleep  queue  is  where  jobs  reside  while  at  the  WAIT  FOR  MESSAGE  step. 

In  the  process  loop  of  Figure  4.2,  GET  JOB  involves  inspecting  the  pend  queue  and,  if  there 
is  an  available  job,  transferring  it  to  the  run  queue.  Jobs  are  run  (RUN  JOB)  as  a  repeated 
sequence  of  tasks,  beginning  at  the  SOLVE  step.  After  the  jobs  pass  through  the  SOLVE  step, 
they  continue  around  the  loop  unless  the  solution  is  unchanged,  in  which  case  the  job  is  placed  in 
the  sleep  queue  (QUEUE  JOB)  to  wait  for  a  message. 

In  a  more  advanced  implementation,  one  might  consider  interrupting  the  SOLVE  step  after 
a  specified  number  of  iterations  and  placing  the  job,  though  unfinished,  back  in  the  pend  queue. 
This  would  have  the  effect  of  further  balancing  the  computation  power  across  the  subproblems, 
and  would  incorporate  new  information  more  quickly.  In  addition,  it  has  recently  been  observed 
that  if  the  simplex  method  is  applied  to  the  dual  formulation  of  a  subproblem,  every  dual-feasible 
extreme  point  visited  by  the  simplex  method  has  the  potential  to  form  a  new  necessary  constraint 
on  the  preceding  neighbor  [HLS88]. 


8**  i°t>  wake  job 


Figure  4.3.  The  transfer  of  jobs  amongst  queues. 

The  modification  of  a  subproblem  in  Benders  decomposition  is  governed  by  the  neighbor  that 
sent  the  message;  if  it  was  the  preceding  neighbor,  then  the  RHS  is  modified;  if  it  was  the  following 
one,  then  a  constraint  is  added.  Once  modified,  subproblems  are  solved  using  the  simplex  method 
and  their  solutions  are  broadcast  to  their  neighbors.  If  a  neighbor  is  in  the  sleep  queue  when  the 
solution  is  sent,  it  is  awakened  and  transferred  to  the  pend  queue,  since  it  is  now  able  to  leave  the 
WAIT  FOR  MESSAGE  step  (see  Figure  4.3). 

4.2.  Reaching  an  Equilibrium 

The  p  processors  continue  their  loops,  becoming  idle  only  when  there  are  no  jobs  in  the  pend 
queue.  If  this  happens,  a  quick  inspection  of  the  run  queue  will  determine  if  all  jobs  are  sleeping 
If  so,  the  system  is  deadlocked.  Each  processor  recognizes  deadlock  as  the  signal  to  stop,  but  before 
stopping,  one  predesignated  processor  will  execute  a  cleanup  operation  such  as  printing  a  solution 
In  the  current  design,  deadlock  is  needed  to  stop  the  algorithm. 

Given  that  a  system  cannot  reach  deadlock  when  jobs  wait  for  messages  that  will  always  be 
sent,  one  may  wonder  whether  the  processors  will  ever  stop,  and  if  so,  how?  When  a  subproblem 
parameter  modification  does  not  cause  an  optimal  solution  of  the  subproblem  to  change,  the  job 
does  not  rebroadcast  its  solution  to  its  neighbors.  At  this  point,  the  subproblem  is  said  to  be 


iii  "equilibrium”  with  its  neighbors;  it  is  then  placed  in  the  sleep  queue  to  wait  for  updating 
information.  All  subproblems  must  reach  a  simultaneous  equilibrium  for  the  same  reason  that  the 
single-processor  Benders  algorithm  does:  1)  there  are  a  finite  number  of  dual  extreme  points  in  the 
subproblems,  and  2)  a  different  one  must  be  communicated  each  time  to  maintain  disequilibrium. 
Hence,  at  some  point  the  collection  of  useful  dual  extreme  points  is  exhausted.  The  equilibrium 
relationship  is  reflexive  and  transitive,  and  so  a  system-wide  equilibrium  is  achieved. 

This  argument  is  also  valid  when  many  such  dual  extreme  points  are  being  passed  simultane¬ 
ously,  as  in  the  multiprocessor  case.  The  condition  for  equilibrium  is  precisely  deadlock,  with  all 
jobs  sleeping,  since  no  new  and  useful  information  is  forthcoming. 

4.3.  Infeasible  and  Unbounded  Solutions 

What  should  be  done  if  a  subproblem  finishes  with  an  infeasible  or  feasible  unbounded  solution?  If 
it  is  the  first  subproblem  and  it  is  infeasible  (INF)  then  the  entire  SLP  must  be  infeasible  because 
the  constraints  corresponding  to  jti  and  p\  cannot  be  satisfied.  If  it  is  the  last  subproblem  finishing 
unbounded  (UNB)  then  the  SLP  must  be  unbounded  because  there  do  not  exist  prices  irn  that  can 
satisfy  the  dual  conditions  associated  with  the  variables  x„  and  u>„.  These  two  cases  correspond 
to  the  top  and  bottom  entries  in  Figure  4.4.  In  both  cases  the  algorithm  stops. 

SLP  Infeasible 
Pass  Extreme  Ray  Forward 

Pass  Feasibility  Constraint  Back 
Pass  Extreme  Ray  Forward 

Pass  Feasibility  Constraint  Back 
SLP  Unbounded 

Figure  4.4.  Alternative  recourses  for  infeasible  and  unbounded  subproblems. 

In  the  remaining  cases  the  algorithm  continues.  If  a  subproblem  other  than  the  first  is  infea¬ 
sible,  the  infeasibility  multipliers,  say  #2,  are  used  to  impose  a  constraint  of  the  form  t^j/j  >  Xj  6i 
on  the  preceding  subproblem  (6*  gets  a  zero  entry).  This  constraint  is  a  necessary  condition  on  yi 
for  the  feasibility  of  the  second  subproblem  and  thus  the  entire  SLP.  In  general,  a  chain  of  such 
infeasibility  conditions  may  extend  from  the  jth  subproblem  back  to  the  first  and  indicate  that  the 
entire  SLP  is  infeasible. 

Likewise,  if  a  subproblem  other  than  the  last  finishes  unbounded,  the  extreme  point,  say 
i/i,  obtained  from  the  primal  feasible  solution  and  the  extreme  ray,  say  ay\  (a  >  0),  from  the 
column  entering  the  basis  and  its  accompanying  cost  O]  from  the  incoming  column’s  reduced  cost 
are  inserted  in  the  following  subproblem  as  two  u^-type  columns.  Only  extreme-ray  columns 
ever  have  coefficients  in  the  objective  row,  which  is  why  they  were  not  explicitly  included  in  the 
formulations  of  the  previous  section.  The  extreme-point  column  has  its  parameter  set  to  one  in 
order  to  modify  the  RHS  (Section  3.3.)  and  the  extreme-ray  column  has  its  set  to  zero  to  allow 
freedom  in  the  direction  of  the  unboundedness.  Its  accompanying  cost  permits  the  subproblem  to 
weigh  the  benefits  of  this  direction  against  present  and  future  costs. 
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If  a  finite  solution  is  found  to  the  ray-modified  subproblem,  the  constraint  it  returns  will 
necessarily  restrict  the  old  ray  solution  of  the  previous  subproblem  by  giving  the  ray  an  unattractive 
positive  cost 

If  an  infeasible  solution  is  found  to  the  ray-modified  subproblem,  the  returned  constraint  will 
cut  off  the  old  ray  solution  outright  (since  it  leads  to  infeasibility). 

If  an  unbounded  solution  is  found,  it  passes  a  new  ray  one  step  further.  Such  new  rays  can 
form  a  path  to  the  last  subproblem,  which  if  unbounded  too,  means  the  entire  SLP  is  unbounded 

Figure  4.5  summarizes  the  Benders  algorithm  for  the  2-period  SLP.  The  reader  should  imagine 
any  number  of  middle  subproblems  inserted  into  the  diagram  to  represent  the  general  case. 


SLP 


Infeasible 


SLP 

Unbounded 


Figure  4.5.  Flow  of  the  parallel  decomposition  algorithm. 


The  boxes  each  represent  a  subproblem  and  its  possible  solution  states  (Inf,  Unb,  and  Opt). 
The  labeled  arcs  represent  the  passed  information  based  on  subproblem  solutions,  and  the  test  for 
equilibrium  is  a  repeated  solution  to  the  second  subproblem. 

5.  THE  FACTORS  AFFECTING  SPEEDUP 

The  behavior  of  the  parallel  decomposition  algorithm  will  now  be  investigated  using  a  seven-day 
diet  problem  (DIET7)  generated  in  the  same  fashion  as  DIET2  was  extended  to  DIET3.  The  three 
parameters  (dimensions)  of  our  behavioral  study  will  be  the  number  of  subproblems,  n,  into  which 
the  seven-day  problem  is  decomposed,  the  number  of  processors,  p,  used  to  solve  the  subproblems, 
and  the  order  in  which  the  subproblems  are  solved.  In  each  case  we  will  discuss  the  factors  involved 
in  obtaining  an  SLP’s  solution  more  swiftly.  DIET?  turns  out  to  be  well  suited  for  decomposition 
because  many  of  the  proposed  algorithm’s  benefits  are  realized  in  the  results  obtained.  Not  all 
problems  are  so  amenable  to  decomposition,  but  we  feel  confident  that  significant  speedups  are 
often  attainable  on  parallel  computers. 
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5.1.  The  Number  of  Subproblems 

Some  of  the  issues  involved  in  deciding  how  many  subproblems  to  create  are: 

1)  the  natural  structure  of  the  problem. 

2)  the  overhead  involved  in  decomposing, 

3)  the  resulting  sizes  of  the  subproblems,  or  how  long  they  will  take  to  solve  on  average,  and 

-I)  the  number  of  processors  available. 

[lie  seven-period  problem  was  solved  as  a  single  LP  in  48  iterations  using  MINOS  5.1  This 
is  the  benchmark  for  comparing  combinations  of  the  above  parameters.  Along  the  subproblem 
dimension  n,  DIET?  was  solved  in  2-,  3-,  and  7-subproblem  decomposition  schemes  using  a  single 
processor.  Efficienci's  can  sometimes  be  gained  by  merely  breaking  a  problem  into  two  subproblems 
because  the  amount  of  work  per  iteration  is  less  and  even  the  total  work  of  both  subproblems  can 
he  less  This  proved  to  be  true  for  DIET?. 

As  a  quick  measure  of  performance,  we  will  assume  that  the  amount  of  work  per  iteration 
is  proportional  to  the  number  of  rows  in  the  subproblem — a  reasonable  approximation  for  sparse 
linear  programs.  The  number  of  iterations  per  subprobiem  is  the  cumulative  sum  of  iterations 
in  successive  solves  until  the  entire  SLP  is  solved.  Hence,  the  iterative  work  per  subproblem  is 
appioximated  a»  the  product  of  the  number  of  rows  and  number  of  iterations.  We  can  observe 
from  Table  5  1  that  the  total  amount  of  iterative  work  actually  decreased  from  the  single  to  the 
double  subproblem  case  for  DIET7. 


p 

n 

sub  #  1 

#  rows  I  #  iterations  I 

work  ! 

1 

1 

1 

16 

48 

totals 

16 

48 

2 

1 

B 

IJitS 

■ 

3 

i 

9 

24 

216 

2 

13 

45 

585 

3 

8 

34 

totals 

30 

103 

10731 

7 

1 

7 

28 

2 

9 

21 

*  * 

3 

9 

18 

4 

9 

29 

5 

9 

24 

« 

6 

9 

11 

7 

6 

7 

m 

totals 

58 

138 

HERS 

Table  5.1.  Total  work  as  seen  in  the  subproblem  dimension. 

As  the  number  of  subproblems  is  increased,  the  overhead  of  reformulating  increases  and  there 
is  a  greater  need  for  communication.  At  some  point,  overhead  and  communication  will  begin  to 
outweigh  any  benefits  associated  with  creating  more  subproblems.  Hence,  a  plot  of  the  total  work 
done  against  the  number  of  subproblems  created  should  look  qualitatively  like  Figure  5.1,  which 
attains  a  minimum  at  some  point  n*. 
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Total 

Work 


Figure  5.1.  Total  work  in  the  subproblem  dimension. 

In  the  case  of  DIET7,  n*  =  2  subproblems  using  a  single  processor.  For  a  different  problem 
or  a  different  number  of  processors,  the  value  of  n*  may  be  different — possibly  even  equal  to  one. 

5.2.  The  Number  of  Processors 

Choosing  the  number  of  processors  is  subject  to  its  own  set  of  complexities.  The  design  of  any 
parallel  algorithm  is  based  upon  the  hope  that  the  incorporation  of  more  processors  will  offer 
almost  linear  speedup.  However,  the  allocation  of  additional  processors  is  a  key  issue  because 
there  are  decreasing  returns  on  investment.  It  is  important  that  extra  processors  are  used  and  do 
not  sit  idle.  Having  more  processors  than  subproblems  is  an  obvious  case  of  inefficiency,  but  even 
when  their  numbers  are  equal,  some  processors  will  inevitably  become  idle  (e  g.  n  =  p  =  2).  For 
any  given  problem,  there  is  some  compromise  position  at  which  the  best  performance  is  achieved 


Table  5.2.  Total  work  as  seen  in  the  processor  dimension. 
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Note  in  Table  5.2  that  with  two  or  four  processors  the  total  work  necessary  to  solve  the 
problem  is  less  than  in  the  single  processor  case.  This  phenomenon  is  due  to  the  order  in  which 
subproblems  are  solved,  and  will  be  discussed  further  in  the  next  section. 

5,3.  Subproblem  Ordering 

hi  the  present  implementation  of  the  parallel  decomposition  algorithm,  there  is  no  explicit  control 
over  the  order  in  which  subproblems  are  solved.  They  are  solved  as  they  are  taken  by  idle  processors 
from  t lie  pend  queue  on  a  first-in  first-out  basis.  (The  first  in  the  pend  queue  will  be  the 
.-.iihproblem  that  received  a  message  least  recently.)  The  importance  of  order  on  solution  time 
is  demonstrated  in  Table  5.3,  where  DIET7  was  solved  twice  with  four  processors  on  a  Sequent 
Balance  8000.  This  machine  has  8  CPUs,  but  because  the  processors  are  continually  shared  with 
other  users,  the  order  in  which  the  subproblems  are  solved  is  not  guaranteed  to  be  the  same  in  any 
two  otherwise  identical  runs  As  Table  5.3  shows,  this  can  significantly  affect  the  performance  of 
the  algorithm. 


Table  5.3.  Total  work  as  seen  in  the  order  dimension. 


A  possible  remedy  was  previously  alluded  to  during  the  discussion  of  the  RUN  JOB  loop. 
If  the  algorithm  were  enhanced  so  that  each  job  were  put  back  into  the  pend  queue  after  some 
predetermined  number  of  iterations,  the  power  of  the  CPUs  would  be  more  evenly  distributed  over 
the  subproblems  and  the  processors  would  be  utilized  more  efficiently.  This  practice  has  the  effect 
of  incorporating  new  information  more  quickly  because  the  latest  solutions  can  be  used  to  make 
modifications  midway  through  a  solution  step.  It  also  reduces  the  time  that  subproblems  spend 
waiting  for  a  processor.  Future  work  will  include  such  an  enhancement  to  the  algorithm. 
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Appendix  A.  THE  DIET  PROBLEM  SOLUTION 


Ue  wish  to  show  how  an  optimal  solution  to  (2)  can  be  fashioned  from  an  optimal  solution  of  the 
smaller  LP  (1). 


minimize 

N 

II 

*0 

7r  : 

Ax  >  b 

P  ■ 

aTx>  0/2 
x  >  0. 

minimize 

CTX\  +  ctx2  =  z2 

tti  : 

Axi  >  b 

Pi  : 

aTX[  +  arx2  >  0 

?r2  : 

Ax2  >  b 

Xj  >  0 

With  x  and  (tr  p)  as  the  optimal  primal  and  dual  solutions  to  (1),  and  (xi  x2)  and  (in  p i  ir2)  as 
the  optimal  primal  and  dual  solutions  to  (2),  we  know  from  strong  duality  that 

ii  =  cTi  =  bT  ir  +  0p/2,  (3) 

and 


z2  =  cTx  i  +  cTx2  =  bTitl  +  0pi  +  6T7T2- 


(4) 


a)  Show  that  is  primal  feasible  for  (2). 

Clearly  x  >  0.  The  ir  constraints  of  (1)  imply  that  the  7rt  and  ir2  constraints  of  (2)  are 
satisfied,  and  the  p  constraints  of  (1)  imply  that  aTx  >  0/2,  i.e.  2 aTx  >  0.  I 

h)  Show  that  (ir  p  if)  is  dual  feasible  for  (2). 

The  dual  of  (2)  is 

maximize  bT  iri  +  0p^  +  bTir2  —  z2 

Xj  :  ATir\  +  api  >  c 

x2  :  ap\  +  ATi r2  >  c 

irj  >0,  ir2  >  0. 

fhe  dual  of  (1)  implies  that  ATx  +  ap>  c  and  if  >  0.  I 

c)  Show  that  ^  J  ,  (ir  p  if)  is  optimal  for  (2). 

Knowing  these  primal  feasible  and  dual  feasible  solutions  of  (2),  it  is  sufficient  to  show  that 
they  satisfy  (4).  From  (3), 


2ii  =  cTx  +  cTx  =  bTx  +  0p  +  bT  ir  =  z2. 


Note  that  if  x  is  non-degenerate 


■G) 


cannot  be  basic  for  (2)  because  it  has  an  even  number  of 


variables  off  their  bounds,  and  (2)  has  an  odd  number  of  constraints. 


-20- 


UNCLASSIFIED 

MCUNITV  CLASSIFICATION  OF  THIS  NAM  (m*m  Ml.SMn4. 

|  REPORT  DOCUMENTATION  PAGE 

READ  INSTRUCTIONS  1 

BEFORE  COMPLETING  FORM  I 

4.  TITCK  CaaN  JNAMMaJ 

A  Parallel  Decomposition  Algorithm 
for  Staircase  Linear  Programs 

S.  TYNE  OF  REPORT  B  FERMO  COVERED 

Technical  Report 

S.  FERFORMINO  oro.  report  number 

7.  AUTNONtt) 

Robert  Entriken 

S.  CONTRACT  OR  BRANT  NUMRERfA) 

N00014-85-K-0343 

4.  PERFORMING  O  NO  AMI  Z  AT  ION  NAMC  AMD  AOONCSS 

Department  of  Operations  Research  -  SOL 

Stanford  University 

Stanford,  CA  94305-4022 

IB.  PROORAM  ELEMENT.  PROJECT,  TAM 

AREA  *  BORN  UNIT  NUMRERS 

111  IMA 

•  1.  CONTNOLLINO  OFFlCC  NAMC  AND  ADDRESS 

Office  of  Naval  Research  -  Dept,  of  the  Navy 

800  N.  Quincy  Street 

Arlinqton,  VA  22217 

IS.  REPORT  OATE 

December  1988 

IS.  NUMBER  OP  PARES 

20  pages 

IS.  SECURITY  CLASS,  fW  BNa  nnN 

UNCLASSIFIED 

IS.  DISTRIBUTION  STATEMENT  (•!  mit  M*m*) 

This  document  has  been  approved  for  public  release  and  sale; 

Its  distribution  Is  unlimited. 

hhhh^h 

IS.  SUPPLEMENTARY  NOTES 

IS.  KEY  VOROS  fC Mkw  «■  nmin  «M »  II  minin  N<  MNIIS  *F  Mm*  mM) 

Linear  Programming,  Parallel  Processors,  Decomposition, 

Large-Scale,  MINOS,  Staircase 

SO.  ABSTRACT  fCHt— 1  M  mtnn  MBS  IT  NNNN  «N  MMI*  If  Mm*  mnNiJ 

(Please  see  reverse  side) 

DO  i  jSTn  1473  coition  of  «  nov  m  i*  obsolete 


security  clasmfication  or  tni«  paoe  <**«■  •* ««• 


A  PARALLEL  DECOMPOSITION  ALGORITHM 
FOR  STAIRCASE  LINEAR  PROGRAMS 


Robert  Entriken 

Department  of  Operations  Research 
Stanford  University 

Abstract 

As  part  of  an  extended  research  project  on  the  parallel  decomposition  of  linear  programs,  a  parallel 
algorithm  for  Staircase  Linear  Programs  was  designed  and  implemented.  This  class  of  problems 
encompasses  a  large  range  of  planning  problems  and  when  decomposed  has  simple  subproblem 
formulations  and  communication  patterns.  This  makes  its  solution  a  manageable  step  toward  our 
eventual  goal  of  producing  a  general  code  that  automatically  exploits  problem  structures  of  various 
forms. 

The  results  presented  here  were  derived  from  an  implementation  for  a  Sequent  Balance  8000 
shared-memory  multiprocessor.  The  algorithm  itself  is  message-based  but  can  run  on  either  shared- 
or  distributed-memory  parallel  computers. 

A  simple  diet  planning  problem  is  used  to  demonstrate  the  principles  of  the  algorithm’s  develop¬ 
ment  and  performance  When  applied  to  this  problem,  the  parallel  decomposition  algorithm  shows 
promise  relative  to  present  serial  optimization  codes.  The  nonlinear  optimization  code  MINOS  5.1 
is  used  both  as  a  basis  for  comparison  and  as  a  generic  subproblem  solver.  The  greatest  room 
for  speedup  is  in  exploiting  problem  structures.  The  results  show  that  decomposition  can  improve 
efficiency  even  with  a  single  processor.  Examples  are  given  where  multiple  processors  lead  to  still 
greater  efficiency. 
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